AITopics | numerical rank

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Neural Information Processing SystemsApr-30-2026, 06:56:11 GMT

Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearlyseparable representations, while the subsequent layers, which we refer to as the tunnel, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Experimental setup

Neural Information Processing SystemsFeb-17-2026, 22:21:53 GMT

In this section, we detail the model architectures examined in the experiments and list all hyperpa-rameters used in the experiments. Both architectures consist of five stages, each consisting of a combination of convolutional layers with ReLU activation and max pooling layers. The base number of channels in consecutive stages for VGG architectures equals 64, 128, 256, 512, and 512. The subsequent stages are composed of residual blocks. In the case of ResNets, we report the results for the'conv2' layers.

artificial intelligence, machine learning, out-of-distribution performance, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

f249db9ab5975586f36df46f8958c008-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 22:21:50 GMT

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
Europe > Poland > Masovia Province > Warsaw (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Cognitive Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

d5cd70b708f726737e2ebace18c3f71b-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 03:53:11 GMT

dimension, matrix, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d5cd70b708f726737e2ebace18c3f71b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 03:53:07 GMT

dimension, matrix, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability

Sanyal, Suman

arXiv.org Machine LearningDec-8-2025

Classical supervised learning evaluates models primarily via predictive risk on hold-out data. Such evaluations quantify how well a function behaves on a distribution, but they do not address whether the internal decomposition of a model is uniquely determined by the data and evaluation design. In this paper, we introduce \emph{Modular Jets} for regression and classification pipelines. Given a task manifold (input space), a modular decomposition, and access to module-level representations, we estimate empirical jets, which are local linear response maps that describe how each module reacts to small structured perturbations of the input. We propose an empirical notion of \emph{mirage} regimes, where multiple distinct modular decompositions induce indistinguishable jets and thus remain observationally equivalent, and contrast this with an \emph{identifiable} regime, where the observed jets single out a decomposition up to natural symmetries. In the setting of two-module linear regression pipelines we prove a jet-identifiability theorem. Under mild rank assumptions and access to module-level jets, the internal factorisation is uniquely determined, whereas risk-only evaluation admits a large family of mirage decompositions that implement the same input-to-output map. We then present an algorithm (MoJet) for empirical jet estimation and mirage diagnostics, and illustrate the framework using linear and deep regression as well as pipeline classification.

decomposition, perturbation, pipeline, (16 more...)

arXiv.org Machine Learning

2512.05638

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Asia > India > Goa (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Data Science (0.93)

Add feedback

A Experimental setup

Neural Information Processing SystemsOct-9-2025, 11:37:24 GMT

In this section, we detail the model architectures examined in the experiments and list all hyperpa-rameters used in the experiments. Both architectures consist of five stages, each consisting of a combination of convolutional layers with ReLU activation and max pooling layers. The base number of channels in consecutive stages for VGG architectures equals 64, 128, 256, 512, and 512. The subsequent stages are composed of residual blocks. In the case of ResNets, we report the results for the'conv2' layers.

artificial intelligence, machine learning, out-of-distribution performance, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding Transformers for Time Series: Rank Structure, Flow-of-ranks, and Compressibility

Yu, Annan, Maddix, Danielle C., Han, Boran, Zhang, Xiyuan, Ansari, Abdul Fatir, Shchur, Oleksandr, Faloutsos, Christos, Wilson, Andrew Gordon, Mahoney, Michael W., Wang, Yuyang

arXiv.org Artificial IntelligenceOct-7-2025

Transformers are widely used across data modalities, and yet the principles distilled from text models often transfer imperfectly to models trained to other modalities. In this paper, we analyze Transformers through the lens of rank structure. Our focus is on the time series setting, where the structural properties of the data differ remarkably from those of text or vision. We show that time-series embeddings, unlike text or vision, exhibit sharply decaying singular value spectra: small patch sizes and smooth continuous mappings concentrate the data into low-rank subspaces. From this, we prove that the associated $Q/K/V$ projections admit accurate low-rank approximations, and that attention layers become compressible in proportion to the decay of the embedding spectrum. We introduce the concept of flow-of-ranks, a phenomenon by which nonlinear mixing across depth inflates the rank, explaining why early layers are most amenable to compression and why ranks grow with depth. Guided by these theoretical and empirical results, we use these insights to compress Chronos, a large time series foundation model, achieving a reduction of $65\%$ in inference time and $81\%$ in memory, without loss of accuracy. Our findings provide principled guidance for allocating width, depth, and heads in time series foundation models, and for exploiting their inherent compressibility.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.03358

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.66)

Technology: